Ellipsoid calculations versus manual tumor delineations for glioblastoma tumor volume evaluation

In glioblastoma, the response to treatment assessment is essentially based on the 2D tumor size evolution but remains disputable. Volumetric approaches were evaluated for a more accurate estimation of tumor size. This study included 57 patients and compared two volume measurement methods to determine the size of different glioblastoma regions of interest: the contrast-enhancing area, the necrotic area, the gross target volume and the volume of the edema area. The two methods, the ellipsoid formula (the calculated method) and the manual delineation (the measured method) showed a high correlation to determine glioblastoma volume and a high agreement to classify patients assessment response to treatment according to RANO criteria. This study revealed that calculated and measured methods could be used in clinical practice to estimate glioblastoma volume size and to evaluate tumor size evolution.

www.nature.com/scientificreports/ imaging (MRI) is the gold standard radiological examination to the assessment to treatment with the tumor size monitoring. An optimal tumor size evaluation is primordial to evaluate progression and propose the most relevant therapeutic strategy. The evaluation of the response to treatment is essentially based on tumor size evolution, which is often used as an endpoint of clinical studies 2 . Traditionally, tumor size is estimated by a cross-sectional 2D method with the product of the largest perpendicular diameters on T1-weighted contrast-enhanced MRI rather than a 1D method 3 . In 1990, MacDonald et al. were the first to propose criteria for treatment response assessment with the enhancing tumor area 2D measurements evolution 4 . In 2009, the RANO working group published the RANO criteria and defined four groups of GBM treatment response: complete response, partial response, stable disease, and progressive disease. The RANO radiological criteria included the tumor size evolution in 2D obtained by calculating the sum of the product of the largest diameters on measurable lesions (at least 10 mm) and ranked a tumor as "progression disease" when the 2D size increased at least 25% or when the fluid-attenuation inversion recovery on T2-weighted (T2-FLAIR) images lesion increased but was without measurement guidelines 5 .
This size measurement method was criticized because GBM is usually an irregular tumor with a cystic area, a surgical cavity, hemorrhage and no sharp demarcation that could compromise the size estimation and lead to error in therapeutic decisions [6][7][8] . Consequently, the volumetric approach seemed to be more appropriate to obtain a better estimation of tumor size with more accuracy 6 . In 2017, Ellingson et al. published modified RANO criteria to estimate GBM evolution. In addition to the two-dimensional measurement, a volumetric approach was described and an increase of 40% or more of the total tumor volume on two sequential MRI separated with 4-8 weeks defined a "durable progression disease" 9 . However, no detail of the measurement method was described 9,10 .
With improvements in imaging techniques, higher resolution, complex tumors such as GBM are easily and precisely measured in size. However, according to the method of measurement, the cost, the expertise, the complexity or the time required to reach results are highly variable 7,11 . Many possibilities are available to measure the volume of a tumor: manual segmentation, semi-automated segmentation, automated segmentation or calculation methods 12 . Despite the numerous and heterogeneous tumor volume estimation advanced techniques currently available, their use in daily medical practice remains limited due to lack of resources and time. Nowadays, no segmentation algorithm had demonstrated its superiority to the others in term of volume measurement accuracy. The accessibility to all practitioners is restricted and often only radiologists used it. For that, to discuss patient management, simple but less precise 1D or 2D methods are often employed.
To improve accuracy and reproducibility of measurements, volumetric approach is necessary. The ellipsoid model was proposed as an acceptable alternative simple volumetric measurement method to replace the 1D and 2D methods. The ellipsoid method uses the three orthogonal linear diameters of the tumor. Some authors compared different geometric model as spheroid, ellipsoid, cylinder or rectangular models and concluded that ellipsoid model was the best for the tumor volume appreciation 13,14 . Other complex methods of tumor volume measurements in glioblastoma were studied as manual segmentation, semiautomated segmentation or fully automated segmentation with discordant results [15][16][17][18][19] . Although numerous publications on semi or automated segmentation models exist in the recent literature, algorithms are heterogeneous and lack of standardization and availability 15,[20][21][22] . Complementary researches and uniformity are required. Even if manual segmentation is a time-consuming method which can lead to bias and inter-observer variability 15,22 , authors showed the participation of a neurooncologist/radiologist expert for the manual segmentation allowed a higher accuracy than automated segmentation for tumor size determination 23 .
This study compared two methods of volume measurement, the ellipsoid model and the manual segmentation, to estimate the size of different GBM regions of interest in adult patients with the aim to propose an acceptable method of volume measurement, available for all, reproductible, simple and easy to use in clinical routine situations.

Results
Calculated volume (CV) versus measured volume (MV). The analysis of the calculated and measured volume is summarized in Table 1. CV was significantly larger than MV for each tumor compartment, the contrast-enhancing area (CE), the necrotic area (NEC), the gross target volume (GTV) and the volume of the edema area (FLAIR) (CE p < 0.001, NEC p = 0.01, GTV p = 0.05 and FLAIR p = 0.01).
Response assessment agreement according to the RANO criteria. For the CE size evolution in percent estimated with CV method, one, three, 23 and 30 patients were classified as complete response (CR), partial response (PR), stable disease (SD) or progressive disease (PD), respectively. For the CE size evolution in percent estimated with the MV method, one, zero, 19 and 37 patients were classified as CR, PR, SD and PD, respectively ( Table 2). A total of 41 (72%) patients were classified in the same category with the CV method and the MV method ( www.nature.com/scientificreports/ p < 0.001), revealing substantial agreement between the two volume measurement methods to classify patients according to the response to treatment into four groups.

Discussion
This study compared the use of geometric model and manual segmentation to evaluate GBM volume and its evolution after specific treatment. The investigations revealed that the CV, with an ellipsoid formula based on tumor diameters, and the MV obtained by manual contouring, had a high correlation for all tumor compartments size estimation with a high agreement to classify patients in RANO treatment response group. Except for GTV and FLAIR that are continuous volumes on imaging, ellipsoid calculations of necrosis and CE were more than twice the MV measurements. The mean difference between calculated and measured volume for CE, NEC, GTV and FLAIR was 115.2%, 139.9%, 21.7% and 37.9%, respectively. This could be explained by the fragmented, shape size, blurred borders and irregular presentation of these volumes 14 leading to sum several ellipsoid calculations, consequently, summing more higher volumes inherent to ellipsoid calculation. However, the difference of these ellipsoid volumes with MV measurements were lower than differences obtained with spheroid and rectangle calculations (data not shown). Consequently, a coefficient correlation from 0.90 to 0.95 was obtain for all the regions of interest but, the intermethod agreement was poor for CE (ICC = 0.33) and NEC (ICC = 0.56). The literature  14 . In the present study, one observer (a radiation oncologist) provided CV and MV measurements, in opposition to some studies where two or more observers delineation were used for comparison 14,[16][17][18][24][25][26] . The current approach was original improving homogeneity and removed the inter-observer variability. Furthermore, all MRI scans were performed by one scan with the same imaging parameters that decrease inter-observer variability. The 2D measurement was very simple and fast to use and was considered adapted in routine clinical practice without the need for specific software 11 . However, the intra-and inter-observer variabilities were high, and measurements could lack of objectivity and reproducibility 23,27 and provide a worse estimation tumor response to treatment 24,25,28,29 . Moreover, the 2D measurement indirectly included the cystic and surgical cavity 30 , although the RANO criteria stated their exclusion 31 . With the recent development of novel therapies causing pseudoresponse and pseudoprogression, the treatment response needed more accuracy and reliability and less interobserver variability 32 . For this reason, a volumetric approach sparked interest 33 . With a volumetric measurement,  www.nature.com/scientificreports/ GBM boundaries were respected and cystic and necrotic areas were excluded 7 . To replace the 2D measurement method with simple and reproducible method, geometric models were studied and some authors showed the ellipsoid formula was more accurate than spheroid, cylinder or rectangular models 13,14,26 . For that, for the CV method, the ellipsoid formula was chosen as the geometric model to estimate the tumor volume for its simplicity in everyday clinical setting. However, some authors concluded ellipsoid formula remained insufficient with a higher intra-and inter-rater variability and less sensitivity to analyze early progression and small lesion evolution 13,27 .
For the MV method, manual segmentation was used. Although this method was time consuming 17,34,35 , expensive and potentially subjective, tumor segmentation was a complex task that needed much experience and competency for the appreciation of mixed areas, cystic and surgical cavities, necrosis, shape, and border enhancement that were not always well defined and reproducible from one software to another with semi-automated or fully automated methods 12,15,22 but these software remain disputable 15,23 .
The GBM response to treatment was routinely evaluated by conventional MRI, with the change in tumor size. For a standardized response evaluation, the RANO criteria were used to classify patients 5 . Radiological criteria were based on contrast-enhancing lesion and FLAIR and excluded cystic and surgical cavities. However, some limitations persisted. The assessment of contrast-enhancing lesions was based on a 2D measurement, not on a volumetric approach. The FLAIR lesion assessment was not defined with a percentage of change and was neuroradiologist appreciation-dependent 36 . Ellingson et al. proposed modified RANO criteria to evaluate the radiological response with a volumetric approach, only considering contrast-enhancing lesions, with a threshold of 40% increase in volume for PD, without volume measurement recommendation 9 . In the current study, the classification of patients was exclusively based on CE volume changes, as suggested by modified RANO criteria.
Some authors studied the impact of volume measurement methods on the response to treatment assessment in GBM patients 11,16,17,22 . However, comparison were only between the different studied methods 17,22 , RECIST evaluation 16 , RANO response 11 . In this study, when modified RANO response groups were used according to the volume percentage changes, the CV and MV had a substantial agreement of 72% (K = 0.66) that revealed manual segmentation did not improve patient response determination according to modified RANO criteria versus ellipsoid model. This agreement was excellent for the two patients with CR and PR. Difference can be observed in SD and PR agreement where ellipsoid calculation was more "optimistic" than the MV evaluation. This could be the consequence of (i) the difference of initial values obtained by the two methods (i.e. ellipsoid and MV methods), (ii) progression was evaluated only on CE volumes and not in all regions of interest.
This current study revealed that there was no difference in OS for patients with PD or not according to the volume methods used. This could be explained by the fact that patients always relapse and the time between relapse and death is always short. Time of relapse has only a low impact on OS. Secondly, the number of patients in this series can also be a cause of the absence of difference. Thirdly, this could be a reason to use the ellipsoid method, because of this absence of change. However, in the series showing a significant difference between measurements and OS, number of patients were lower or equivalent to the number of our series, but the cut-off to conclude of a progression was highly different, leading comparison between series very challenging 11,[37][38][39][40] .
As numerous study limitations in the literature, this study was a retrospective study with a relatively small sample size of population with a lack of statistical power. Another issue of the type of study was the lack of in vivo standard reference of brain tumor volume measurement and truth volume size unknowledge. Despite the development of numerous automated segmentation methods and more than 20 years of research, computer assisted methods remained challenging and the need of clinical research persist to homogeneous practice 41 . Finding a pertinent tool with high relevance in routine, easy to use and adapted to the therapeutic management and clinical trial design was a real challenge. In addition to the conventional MRI analysis, tumor size determination using advanced and multimodal MRI appeared promising 20,42,43 . To further improve the assessment of GBM, machine learning models were developed 44 . In fact, an MRI containing over a million voxels that constituted a complex "big data" management and deep learning methods for segmentation, survival prediction or brain www.nature.com/scientificreports/ tumor gradation was to develop 45,46 . Therefore, quantitative features as textural and geometric data could be explored, combined with genomics, proteomics and clinical data and compiled into diagnostic, prognostic, and therapeutic models 47 .

Methods
This study was approved by the center's institutional review board. All methods were performed in accordance with the relevant guidelines. Informed consent was obtained from all the patients included in this study.

Population.
A total of 139 patients with newly diagnosed and histologically confirmed GBM were identified between January 2015 and December 2017 and reviewed in this single-center retrospective study. Inclusion criteria consisted of (1) age 18 years or older, (2) histopathological confirmation of GBM, (3) completion of entire course of CRT with TMZ after maximal surgery according to the EORTC/NCIC protocol 5 , and (4) MRI follow-up until progression. Thirty-eight patients were excluded because of hypofractionated RT schedules, 15 for another chemotherapy protocol (bevacizumab), eight died before progression, seven had no progression at the time of data collection, five were lost to follow-up, five had gliosarcoma histological conclusion, three had no MRI examination during the follow-up, and one had a history of cerebral irradiation. Finally, 57 patients were included in the study. Patient ages ranged from 24 to 81 years with a median age of 62 years. Forty patients (70%) were male and 17 patients (30%) were female. Two MRIs per patients were evaluated corresponding to the MRI performed before CRT (dosimetric MRI performed 4-6 weeks after surgery to plan radiotherapy) and the MRI where a suspicion of a first progression was diagnosed. The examination was performed on a Signa Excite HDx 3.T™ system (GE Healthcare, Milwaukee, WI) with an 8-channel dedicated head coil. The MRI scanning protocol included pre-and postcontrast 1-mm, 3-dimensional (3D) volumetric T1-weighted multi-echo magnetizationprepared rapid-acquisition gradient echo (MPRAGE) sequences, and a T2-FLAIR images. MRI showing a suspicion of progression was performed at a mean of 23.6 weeks after the completion of CRT.
Recorded data. For each MRI, the volume of different GBM regions of interest was evaluated (Fig. 3). On the T1-weighted contrast-enhanced MPRAGE sequence, the CE, the NEC and the GTV which included the CE, the NEC and the surgical cavity, were obtained. On the T2-FLAIR sequence, the volume of the edema area (FLAIR) was measured. On the MRI performed before CRT, one, two or three CE areas were seen in 48, five and four patients, respectively. When region of interest was fragmented, we performed an ellipsoid calculation for each fragment and the overall ellipsoid calculation was the sum of all the ellipsoid fragments calculations. Two methods of volume measurements were used to define the volume of different GBM regions of interest and compared.
The CV for each compartment of the tumor was obtained with the ellipsoid volume formula: π/6*D1*D2*D3, where D1, D2 and D3 corresponded to the largest diameter of the compartment measured in three-dimensional plans (axial, sagittal and coronal reformations).
The MV resulted from the manual delineation in all MRI slices, (slice per slice) of each compartment computed with FocalSim™ (Elekta ® , Stockholm, Sweden) contouring software. After contouring, the volume was automatically calculated by the software for each compartment. www.nature.com/scientificreports/ For patients with multifocal lesions, all lesions were measured with the two methods separately and summed for the comparison. For the CE, the analyzed volume corresponded to the sum of the measurable lesions (which had at least two diameters greater than 10 mm) according to the RANO criteria.
The measurements were performed in by only one radiation oncologist resident with 6 years of experience (CL) and corrected by two reviewers with over 20-years of experience, a neuroradiation oncologist expert (GN) and a neuroradiologist expert (JMC). Any disagreements between the two reviewers were resolved through discussion between the three protagonists and potential corrections were consensually adopted. Finally, all measurements were approved by the two reviewers.
Evaluation of the response according to the RANO criteria. Only for the response to treatment classification was the MRI showing the best response to treatment examined in seven patients (data not shown).
Assessment response category, CR, PR, SD or PD, defined according to the RANO 5 and the modified RANO criteria 10 , were applied only on the CE comparing the MRI performed before CRT and the MRI showing a suspicion of progression for 50 patients and the MRI showing the best response after treatment and the MRI showing a suspicion of progression for seven patients 3,5,6,9,17,28 . For each patient, the classification based on the CV and the classification based on the MV were compared. CR, PR, SD and PD were defined according to Ellingson et al. (CR 100% decrease; PR ≥ 65% decrease; PD ≥ 40% increase; SD 40-65%) 9 .
Statistical analysis. The comparison was performed in volume (cm 3 ) and percent variations between the two methods of measurement. The correlation between the two volume measurement methods for each compartment was analyzed with Pearson correlation coefficients and 95% confidence intervals (95%CIs). For each type of compartment, the ICC estimated the interrater reliability of measurements as follows: poor reliability for ICC < 0.50, moderate reliability for ICC 0.50-0.75, good reliability for ICC 0.75-0.90 and excellent reliability when the ICC > 0.90. The comparison of the two measurement methods was represented by a Bland-Altman plot for each compartment. The agreement of the patient's response to treatment category of the two volume measurement methods was assessed using Cohen's weighted kappa statistic with the 95% CI with the kappa value ranging from −1 to + 1. A kappa value between 0.01 and 0.20 indicated no or slight agreement, between 0.21 and 0.40 indicated fair agreement, between 0.41 and 0.60 indicated moderate agreement, between 0.61 and 0.80 indicated substantial agreement and between 0.81 and 1.00 indicated almost perfect agreement.
A survival analysis was performed to evaluate the impact of the volume measurement methods on OS according to the response to treatment expressed by the radiological RANO classification (CE volume evolution). Patients were classified as PD or non-PD (CR, PR or SD). OS was determined from the date of pathological diagnosis to death or the last follow-up. OS in PD and non-PD patients was estimated by a log-rank test, and Kaplan-Meier survival curves 48,49 were drawn for each group according to the measurement methods. A Cox regression analysis 50 was performed to compare the two measurement methods to predict OS, with the determination of hazard ratios (HRs) and their 95% confidence intervals (95% CIs).
Ethics approval. This study received institutional ethics board approval from the research committee of the ICANS comprehensive cancer center.
Consent for participate/consent for publication. Written informed consent was obtained from the patient for the publication of this report.

Conclusions
The GBM evaluation should be not ambiguous or complex for clinical management and clinical trials, and a volume approach seemed more realistic. The development, standardization and accessibility of segmentation methods should be encouraged. The current study showed a high concordance between manual segmentation and the ellipsoid formula to define the volumes of GBM compartments with a good agreement to classify the patient response to treatment according to the four RANO groups, suggesting the use of the ellipsoid formula in clinical practice. However, CV measurements were significantly larger than MV measurements and the interrater variability for CE volume definition was poor. For that the treatment response categorization of patients should be performed with caution using the ellipsoid formula and segmentation methods must be preferred to make therapeutic decision.

Data availability
The datasets generated during and/ or analyzed during the current study are available from the corresponding author on reasonable request.